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I.  INTRODUCTION 

Breast  mass  segmentation  is  arguably  one  of  the  most  difficult  tasks  in  the  development  of  Computer-Aided 
Diagnostic  (CADJ  systems.  The  main  objective  of  this  research  is  to  develop  an  image  segmentation 
method  for  mammograms  that  contain  dense  tissue  as  well  as  for  mammograms  that  contain  dense/fatty 
tissue,  while  its  second  objective  is  to  incorporate  the  segmentation  method  into  a  CAD*  system. 
Specifically,  we  intend  to  do  the  following:  (1)  To  develop  an  automatic  image  segmentation  scheme  to 
separate  clinically  occult  breast  masses  from  surrounding  tissue  (2)  To  evaluate  the  method  by  comparing 
the  ROIs  with  mammographers’  drawings  and  (3)  To  separate  masses  from  glandular  tissues  using  the 
Multiple  Circular  Path  Convolution  Neural  Network  (MCPCNN)  classifier. 

II.  BODY 

During  the  past  12  months  the  PI  has  tested  an  automatic  image  segmentation  algorithm  on  a  set  of  dense 
breast  mass  cases.  This  section  of  the  annual  summary  provides  a  detailed  description  of  the  experiment 
and  is  divided  into  the  following  sections:  (A)  Segmentation  Method  -  an  overview  of  the  automated  image 
segmentation  method  (please  see  Appendix  for  detailed  description  of  method)  (B)  Database  and 
Experiments  -  description  of  masses  used  and  experiments  performed  (C)  Results  -  statistical  and  graphical 
results  of  the  experiment  (D)  Discussion  of  Results  and  (E)  Future  Work. 

A.  Segmentation  Method 

The  segmentation  method  used  in  this  study  evaluates  the  steepest  changes  within  a  probabilistic  cost 
function  in  an  effort  to  determine  the  computer  segmented  contour  which  is  most  closely  correlated  with 
expert  radiologist  manual  traces.  It  segments  breast  masses  by  combining  region  growing  with  the 
analysis  of  a  probability-based  function  [1].  Once  a  set  of  contours  is  grown  using  region  growing  the 
probability  density  functions  inside  and  outside  the  contours  are  found.  A  function,  which  is  the  logarithm 
of  these  probability  density  functions,  is  then  constructed.  The  function  is  then  searched  for  possible  steep 
change  locations,  i.e.,  sharp  changes  in  the  logarithm  values,  and  the  intensities  corresponding  to  those 
locations  are  likely  to  produce  contours  which  are  highly  correlated  with  expert  traces.  A  detailed 
description  of  the  method  is  provided  in  the  manuscripts  located  in  the  appendix  of  this  document  [2,  3]. 

B.  Database  and  Experiments 

The  PI  has  selected  342  cases  from  the  University  of  South  Florida’s  Digital  Database  for  Screening 
Mammography  (DDSM)  [4],  where  175  of  these  cases  are  cancerous  masses  and  167  of  the  cases  are 
benign  masses.  The  densities  of  all  cases  from  the  DDSM  have  been  rated  according  to  the  American 
College  of  Radiology’s  (ACR)  density  scale,  which  ranges  from  1-4.  A  breast  containing  a  great  deal  of 
fatty  tissue  would  receive  a  rating  of  1  and  a  breast  containing  a  great  deal  of  dense  tissue  would  receive  a 
rating  of  4.  The  current  database  contains  242  cases  with  a  density  rating  of  3  and  100  cases  with  a  density 
rating  of  4.  In  the  current  experiment  the  cost  likelihood  function  threshold  values  (TV  i  and  TV2)  were  set 
to  18(X)  and  1300,  respectively.  Approximately  300  of  the  cases  were  manually  traced  by  one  expert 
radiologist  (Expert  1),  while  46  of  the  cases  were  manually  traced  by  a  second  expert  radiologist  (Expert 
2).  One  hundred  ninety  eight  of  the  cases  have  been  validated  by  one  expert  radiologist,  while  45  of  the 
cases  have  been  validated  by  the  second  expert  radiologist.  The  validation  statistics  are  overlap, 
accuracy,  sensitivity,  and  specificity  as  described  in  the  manuscripts  [2,  3]. 

C.  Results 

1.  Statistical  Results 

As  described  in  the  manuscripts  [2,  3],  the  cost  likelihood  function  was  used  to  narrow  a  large  set  of  contour 
choices  to  three  contours  which  are  highly  correlated  with  the  expert  radiologist  traces.  Figures  1-4  are 
plots  of  the  percentages  of  cases  regarding  overlap  and  accuracy  statistics  that  fall  between  specific  data 
ranges  for  all  three  contour  choices,  namely,  groups  1,  2,  and  3.  For  example,  the  light  gray  bar 
corresponding  to  the  label  “0-0.09”  shows  the  percentage  of  cases  for  the  group  1  trace  that  fall  between  0 
and  0.09.  All  masses  for  the  plots  shown  used  Expert  I’s  traces  for  validation. 
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Overlap  Measure  for  All  Groups  (Cancer  Cases,  Expert  1) 


Ranges  of  Values 


Figure  1  -  Percentage  of  Cases  for  Ranges  of  Overlap  Values  (Cancer  Cases) 
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Figure  2  -  Percentage  of  Cases  for  Ranges  of  Accuracy  Values  (Cancer  Cases) 
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Figure  3  -  Percentage  of  Cases  for  Ranges  of  Overlap  Values  (Benign  Cases) 
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Figure  4  -  Percentage  of  Cases  for  Ranges  of  Accuracy  Values  (Benign  Cases) 
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Table  1  contains  mean  values  of  overlap  and  accuracy  statistics  for  all  contour  groups  for  both  cancerous 
and  benign  masses. 

Table  1-  Mean  Value  of  Overlap  and  Accuracy  Statistics  for  Groups  1,  2,  and  3  (Expert  1) 


Cancer  Cases 

Benign 

Cases 

Overlap 

Accuracy 

Overlap 

Accuracy 

Group  1 

0.29 

0.73 

0.32 

0.82 

Group  2 

0.46 

0.78 

0.52 

0.87 

Group  3 

0.46 

0.76 

0.51 

0.82 

2.  Visual  Results 

Figures  5-10  show  results  for  cases  in  which  there  were  strong,  average,  and  low  correlations  between  the 
computer-segmented  results  and  Expert  I’s  manual  traces  for  both  cancer  and  benign  cases.  In  cases  for 
which  Expert  2  has  not  yet  traced  the  mass,  the  figure  will  indicate  “no  data”.  Figures  11  and  12  show 
results  for  cases  where  there  is  a  great  deal  of  disagreement  between  the  Expert  1  and  Expert  2  traces. 


Figure  5  -  Cancer  Case  Results  for  Groups  1,  2,  and  3  (Strong  Correlation) 


Figure  6  -  Cancer  Case  Results  for  Groups  1,  2,  and  3  (Average  Correlation) 
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Figure  7  -  Cancer  Case  Results  for  Groups  1,  2,  and  3  (Low  Correlation) 


Figure  8  -  Benign  Case  Results  for  Groups  1,  2,  and  3  (Strong  Correlation) 


Figure  10  -  Benign  Case  Results  for  Groups  1,  2,  and  3  (Low  Correlation) 


Figure  1 1  -  Disagreement  Among  Experts  1  and  2  (Cancer  Cases) 
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Figure  12  -Disagreement  Among  Experts  1  and  2  (Benign  Cases) 


Expert  2 


D.  Discussion  of  Results 

Previous  experiments  showed  that  in  most  cases,  the  contour  produced  by  the  intensity  corresponding  to 
the  first  steep  change  location  in  the  cost  function  best  matched  the  expert  radiologist  traces  regarding  the 
overlap  and  accuracy  statistics.  This  finding  was  verified  via  Analysis  of  Variance  (ANOVA)  testing, 
where  p-values  ranged  from  1.03x10’^  -  7.51xl0‘’^.  However,  the  mean  values  listed  in  table  1  of  this 
experiment  revealed  that  the  group  2  and  group  3  traces  performed  equally  well  regarding  the  overlap 
statistic.  The  accuracy  results  were  slightly  higher  for  group  2  in  comparison  to  group  3.  Figures  1-4 
reveal  that  there  was  a  larger  percentage  of  cases  for  which  the  group  2  and  group  3  traces  achieved 
higher  overlap  and  accuracy  values  (approximately  0.6  and  above),  in  comparison  to  the  group  1  traces 
which  achieved  lower  overlap  and  accuracy  values  (approximately  0.5  and  below).  This  leads  us  to 
believe  that  the  group  2  and  group  3  traces  generally  match  expert  manual  traces  in  the  best  way.  This 
observation  is  consistent  with  the  visual  results  shown  in  Figures  5-10.  We  have  also  obtained  higher 
values  for  overlap  and  accuracy  values  in  previous  experiments  because  the  boundaries  of  the  masses  in 
these  data  sets  were  clearer  than  those  for  the  dense  mass  cases. 

We  also  recognize  that  for  cases  in  which  the  intensity  values  outside  the  perceived  boundaries  of  the 
mass  are  greater  than  or  equal  to  values  inside  the  mass,  the  region  growing  technique  produces  contours 
that  have  grown  into  those  areas.  We  refer  to  this  phenomenon  as  flooding  because  the  contour  “floods” 
into  fibroglandular  tissue  that  is  not  actual  mass  tissue.  Furthermore,  areas  inside  the  mass  that  have  low 
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intensity  values  are  excluded  from  contour  growth  so  some  mass  tissue  is  missed  during  the  region 
growing  phase.  These  are  limitations  of  the  current  segmentation  algorithm  and  we  plan  to  address  these 
issues  during  the  next  phase  of  this  research  work. 

Figures  11  and  12  indicate  that  large  differences  in  expert  interpretations  of  mass  boundaries  can  exist, 
therefore  we  are  motivated  us  to  raise  questions  around  the  consistency  and  reliability  of  expert  traces. 
In  some  cases  there  is  strong  agreement  between  the  computer  trace  and  Expert  I’s  trace  while  in  other 
cases  there  is  strong  agreement  between  the  computer  trace  and  Expert  2’s  trace,  so  it  is  important  to 
analyze  expert  opinions  prior  to  further  optimizing  the  segmentation  method,  regarding  image  filtering. 

E.  Future  Work 

The  statement  of  work  states  that  the  PI  would  filter  the  images  following  the  experiment  performed 
during  the  past  several  months,  however,  the  question  of  expert  reliability  has  come  into  question  and 
should  be  addressed  prior  to  the  optimization  of  the  algorithm.  During  the  next  phase  of  this  research 
work,  the  PI  will  compare  the  computer-segmented  results  to  each  observer  as  well  has  compare  the 
experts’  traces  to  each  other  in  efforts  to  answer  the  questions  around  expert  reliability  and  consistency. 

III.  KEY  RESEARCH  ACCOMPLISHMENTS 

•  Selected  nearly  350  mass  cases  with  density  ratings  of  3  and  4  from  the  University  of  South  Florida’s 
Digital  Database  for  Screening  Mammography 

•  Collected  approximately  100  cases  from  the  Georgetown  University  Medical  Center  image  database 

•  300  masses  have  been  delineated  by  one  expert  radiologist 

•  46  masses  have  been  delineated  by  a  second  expert  radiologist  (this  radiologist  has  agreed  to  delineate 
the  remaining  cases) 

•  Validated  198  masses  regarding  overlap,  accuracy,  sensitivity,  and  specificity  statistics 

•  Reviewed  literature  regarding  reliability  and  consistency  among  expert  observers  [5-10] 

IV.  REPORTABLE  OUTCOMES 
Manuscripts: 

1.  L.  Kinnard,  S.-C.  B.  Lo,  E.  Makariou,  T.  Osicka,  P.  Wang,  M.T.  Freedman,  M.  Chouikha,  “Likelihood 
Function  Analysis  for  Segmentation  of  Mammographic  Masses  for  Various  Margin  Groups”, 
Proceedings  of  the  IEEE  Symposium  on  Biomedical  Imaging,  April  2004. 

2.  L.  Kinnard,  S.-C.  B.  Lo,  E.  Makariou,  T.  Osicka,  P.  Wang,  M.T.  Freedman,  M.  Chouikha,  “Steepest 
changes  of  a  probability-based  cost  function  for  delineation  of  mammographic  masses:  A  validation 
study”.  Medical  Physics  (manuscript  submitted  12/03,  revised  manuscript  submitted  4/04,  revised 
manuscript  accepted  6/04) 

3.  L.  Kinnard,  Ph.D.  thesis,  “Segmentation  of  Mass  Bodies  and  Their  Extended  Borders  on 
Mammograms  Using  Maximum-Likelihood  Analysis”,  June  2003. 

Oral  Presentations: 

1.  L.  Kinnard,  S.-C.  B.  Lo,  E.  Makariou,  T.  Osicka,  P.  Wang,  M.T.  Freedman,  M.  Chouikha,  “Likelihood 
Function  Analysis  for  Segmentation  of  Mammographic  Masses  for  Various  Margin  Groups”, 
Proceedings  of  the  IEEE  Symposium  on  Biomedical  Imaging,  April  2004. 

2.  “Breast  Cancer  Research:  Computer-Aided  Diagnosis  and  Image  Segmentation”,  Howard 
University  Cancer  Center,  December  2003. 

Technical  Development  Activities: 

•  Attended  two  cancer  imaging  workshops  sponsored  by  the  Washington  Academy  of  Biomedical 
Engineering  (WABME): 

1.  9/29/03:  "Individualized  Treatment  Using  Pharmaco-Genomics  &  Functional 
Imaging"  (George  Washington  University) 
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2.  11/12/03:  “Cancer  Imaging  for  the  Operating  Room  of  2020”  (Georgetown  University) 

•  Attended  weekly  cancer  workshops  conducted  by  the  Howard  University  Cancer  Center 

•  Attended  SPIE  Medical  Imaging  Meeting  (February,  2004,  San  Diego,  CA) 

•  Taught  “Computer-Aided  Diagnosis”  portion  of  “Introduction  to  Imaging  Technologies”  course.  The 
Catholic  University  of  America  (course  number  ENGR552) 

V.  CONCLUSIONS 

Overall  the  segmentation  method  has  produced  overlap  and  accuracy  values  that  reflect  the  difficulties  of 
the  data  set.  These  values  are  generally  lower  than  those  of  previous  experiments  because  the 
boundaries  of  dense  tissue  masses  are  exceedingly  difficult  to  locate.  The  flooding  phenomenon  is  also 
responsible  for  these  low  values  because  in  some  cases  the  intensity  values  of  the  masses  are  very  close  to 
those  of  surrounding  fibroglandular  tissue.  The  difficulty  of  locating  mass  boundaries  is  also  reflected  in 
some  cases  where  there  are  significant  differences  between  the  expert  traces.  The  computer-segmented 
results  are  sometimes  more  closely  correlated  with  one  expert  in  some  cases  while  the  results  are  more 
closely  correlated  with  the  second  expert  in  other  cases.  During  the  next  phase  of  this  research  work,  the 
PI  will  compare  the  computer-segmented  results  to  each  observer  as  well  has  compare  the  experts’  traces 
to  each  other  in  efforts  to  answer  the  questions  around  expert  reliability  and  consistency.  Once  these 
questions  have  been  answered  we  believe  that  it  will  possible  to  decide  how  the  images  need  to  be  filtered 
or  if  they  need  to  be  filtered  at  all. 
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ABSTRACT 

The  purpose  of  this  work  was  to  develop  an  automatic  boundary 
detection  method  for  mammographic  masses  and  to  observe  the 
method’s  performance  on  different  four  of  the  five  margin  groups 
as  defined  by  the  ACR,  namely,  spiculated,  ilfdefined, 
circumscribed,  and  obscured.  The  segmentation  method  utilized  a 
maximum  likelihood  steep  change  analysis  technique  that  is 
capable  of  delineating  ill-defined  borders  of  the  masses.  Previous 
investigators  have  shown  that  the  maximum  likelihood  function 
can  be  utilized  to  determine  the  border  of  the  mass  body.  The 
method  was  tested  on  122  digitized  mammograms  selected  from 
the  University  of  South  Florida’s  Digital  Database  for  Screening 
Mammography  (DDSM).  The  segmentation  results  were 
validated  using  overlap  and  accuracy  statistics,  where  the  gold 
standards  were  manual  traces  provided  by  two  expert 
radiologists.  We  have  concluded  that  the  intensity  threshold  that 
produces  the  best  contour  corresponds  to  a  particular  steep 
change  location  within  the  likelihood  function. 

1.  INTRODUCTION 

In  a  CADx  system,  segmentation  is  arguably  one  of  the  most 
important  aspects  -  particularly  for  masses  -  because  strong 
diagnostic  predictors  for  masses  are  shape  and  margin  type  [2,9]. 
The  margin  of  a  mass  is  defined  as  the  interface  between  the  mass 
and  surrounding  tissue  [2].  Furthermore,  breast  masses  can  have 
unclear  borders  and  are  sometimes  obscured  by  glandular  tissue 
in  mammograms.  A  spiculated  mass  consists  of  a  central  mass 
body  surrounded  by  fibrous  projections,  hence  the  resulting 
stellate  shape.  For  the  aforementioned  reasons,  proper 
segmentation  -  to  include  the  body  and  periphery  -  is  extremely 
important  and  is  essential  for  the  computer  to  analyze,  and  in 
turn,  determine  the  malignancy  of  the  mass  in  mammographic 
CADx  systems. 

Over  the  years  researchers  have  used  many  methods  to  segment 
masses  in  mammograms.  Petrick  [7]  et  al.  developed  the  Density 
Weighted  Contrast  Enhancement  (DWCE)  method,  in  which 
series  of  filters  are  applied  to  the  image  in  an  attempt  to  extract 
masses.  Comer  et  al.  [1]  segmented  digitized  mammograms  into 


homogeneous  texture  regions  by  assigning  each  pixel  to  one  of  a 
set  of  classes  such  that  the  number  incorrectly  classified  pixels 
was  minimized  via  Maximum  Likelihood  (ML)  analysis.  Li  [5] 
developed  a  method  that  employs  k-means  classification  to 
classify  pixels  as  belonging  to  the  region  of  interest  (ROI)  or 
background. 

Kupinski  and  Giger  developed  a  method  [4],  which  uses  ML 
analysis  to  determine  final  segmentation.  In  their  method,  the 
likelihood  function  is  formed  from  likelihood  values  determined 
by  a  set  of  image  contours  produced  by  the  region  growing 
method.  This  method  is  a  highly  effective  one  that  was  also 
implemented  by  Te  Brake  and  Karssemeijer  in  their  comparison 
between  the  discrete  dynamic  contour  model  and  the  likelihood 
method  [9].  For  this  reason  we  chose  to  investigate  its  use  as  a 
possible  starting  point  from  which  a  second  method  could  be 
developed.  Consequently  in  our  implementation  of  this  work  we 
discovered  an  important  result,  i.e.,  the  maximum  likelihood  steep 
change.  It  appears  that  in  many  cases  this  method  produces 
contour  choices  that  encapsulate  important  borders  such  as  mass 
spiculations  and  ill-defined  borders. 

2.  METHODS 

2.1  Initial  Contours 

As  an  initial  segmentation  step,  we  followed  the  overall  region 
similarity  concept  to  aggregate  the  area  of  interest  [1,  4].  Used 
alone,  a  sequence  of  contours  representing  the  mass  is  generated; 
however,  the  computer  is  not  able  to  choose  the  contour  that  is 
most  closely  correlated  with  the  experts’  delineations. 
Furthermore,  we  have  devised  an  ML  function  steep  change 
analysis  method  that  chooses  the  best  contour  that  delineates  the 
mass  body  as  well  as  its  extended  borders,  i.e.,  extensions  into 
spiculations  and  areas  in  which  the  borders  are  ill-defined  or 
obscured.  This  method  is  an  extension  of  the  method  developed 
by  Kupinski  and  Giger  [4]  that  uses  ML  function  analysis  to 
select  the  contour  which  best  represents  the  mass,  as  compared  to 
expert  radiologist  traces.  We  have  determined  that  this  technique 
can  select  the  contour  that  accurately  represents  the  mass  body 
contour  for  a  given  set  of  parameters;  however,  further  analysis 
of  the  likelihood  function  revealed  that  the  computer  could 
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choose  a  set  of  three  segmentation  contour  choices  from  the 
entire  set  of  contour  choices,  and  then  make  a  final  decision  from 
these  three  choices. 

The  algorithm  can  be  summarized  in  several  steps.  Initially,  we 
use  an  intensity  based  thresholding  scheme  to  generate  a 
sequence  of  grown  contours  (5,),  where  gray  value  is  the 
similarity  criterion.  The  image  is  also  multiplied  by  a  2D 
trapezoidal  membership  function  (2D  shadow),  whose  upper  base 
measures  40  pixels  and  lower  base  measures  250  pixels  ( 1  pixel  = 
50  microns).  The  image  to  which  the  shadow  has  been  applied  is 
henceforth  referred  to  as  the  "fuzzy"  image.  The  original  image 
and  its  fuzzy  version  were  used  to  compute  the  likelihood  of  the 
mass’s  boundaries.  The  computation  method  is  comprised  of 
two  components  for  a  given  boundary:  (1)  formulation  of  the 
composite  probability  and  (2)  evaluation  of  likelihood. 

In  addition,  we  chose  to  aggregate  contours  using  the  original 
image.  This  accounts  for  the  major  difference  from  that 
implemented  by  the  previous  investigators.  Since  smoother 
contours  were  not  used,  the  likelihood  function  showed  greater 
variations.  In  many  situations,  the  greatest  variations  occurred 
when  there  was  a  sudden  increase  of  the  likelihood,  and  this  was 
strongly  correlated  with  the  end  of  the  mass  border  growth.  This 
phenomenon  would  be  suppressed  if  the  fuzzy  image  was  used  to 
generate  the  contours.  The  fuzzy  image  was  used  mainly  to 
construct  the  likelihood  function. 

2.2  Composite  Probability  Formation 

For  a  contour  (5,),  the  composite  probability  (C,)  is  calculated: 

C.K  =  p{f,{x,y\s,)xp{m,{x,y}s,)  (1) 

The  quantity is  the  area  to  which  the  2D  shadow  has  been 
multiplied,  p(fi(x,y)\Si)  is  the  probability  density  function  of  the 
pbcels  inside  5,-  where  ‘i’  is  the  region  growing  step  associated 
with  a  given  intensity  threshold.  The  quantity  is  the  area 

outside  5/ (non-fuzzy),  and  p{mi(x,y)\Si)  is  the  probability  density 
function  of  the  pixels  outside  5,.  Next  we  find  the  logarithm  of 
the  composite  probability  of  the  two  regions,  Q: 

Log  (c,  |5, )  =  Iog(p(/,  (x,  yp, ))+  log(p(m,  (x,  y)j5, ))  (2) 


Based  on  this  assumption,  we  have  carefully  analyzed  the 
behavior  of  maximum  likelihood  function.  The  analysis  reveals 
that  we  have  successfully  discovered  that  the  most  accurate  mass 
delineation  is  usually  obtained  by  using  the  intensity  value 
corresponding  to  the  first  or  second  steep  change  locations  within 
the  likelihood  function  immediately  following  the  maximum 
likelihood  value  on  the  likelihood  function. 
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Figure  1:  A  likelihood  function  with  steep  change  indicators 


2.4  Steep  change  definition 

The  term  "steep  change"  is  rather  subjective  and  can  defined  as  a 
location  between  two  or  more  points  in  the  function  where  the 
likelihood  values  experience  a  significant  change.  In  some  cases 
the  likelihood  function  increases  at  a  slow  rate.  The  algorithm 
design  accounts  for  this  issue  by  calculating  the  difference 
between  likelihood  values  in  steps  over  several  values  and 
comparing  the  results  to  two  thresholds.  The  difference  equation 
is  given  by: 

h{t)~  -{-l)),  r  =  0,...,A  (5) 

where /is  the  likelihood  ftmction,  z  is  the  maximum  intensity,  w  is 
the  width  of  the  interval  over  which  the  likelihood  differences  are 
calculated  (e.g.  -  for  w=7  differences  are  calculated  every  7 
points),  and  N  is  the  total  number  of  points  in  the  searchable  area 
divided  by  w.  If  the  calculation  in  question  yields  a  value  greater 
than  or  equal  to  a  given  threshold,  then  the  intensity 
corresponding  to  this  location  is  considered  to  be  a  steep  change 
location.  The  threshold  algorithm  occurs  as  follows: 


2.3  Evaluation  of  Likelihood  Function 

The  likelihood  that  the  contour  represents  the  fibrous  portion  of 
the  mass,  i.e.,  mass  body  is  determined  by  assessing  the  maximum 
likelihood  function: 

argmax(Log(cj5j);5,,/  =  !,...«  (3) 

Equation  (3)  intends  to  find  the  maximum  value  of  the 
aforementioned  likelihood  values  as  a  function  of  intensity 
threshold.  It  has  been  assessed  (also  by  other  investigators  [4]) 
that  the  intensity  value  corresponding  to  this  maximum  likelihood 
value  is  the  optimal  intensity  needed  to  delineate  the  mass  body 
contour.  However,  in  our  implementation  it  was  discovered  that 
the  intensity  threshold  corresponding  to  the  maximum  likelihood 
value  confines  the  contour  to  the  mass  body.  In  our  study  many 
of  these  contours  did  not  include  the  extended  borders.  We, 
therefore,  hypothesize  that  the  contour  represents  the  mass’s 
extended  borders  may  well  be  determined  by  assessing  the 
maximum  changes  of  the  likelihood  function,  i.e.,  locate  the 
steepest  likelihood  value  changes  within  the  function: 

^(Lo5(c|5))t5,i  =  l,...,«  (4) 


If  (h(0ML  >  ML-]*]);  t-0,...,m 

Then  choice  1  =  intensity  where  that  condition  is  satisfied 
If  (h(t)ML  >  MLj2);  t=m,...,z 

Then  choice  2  =  intensity  where  that  condition  is  satisfied 

where  h(t)ML  is  the  steep  change  value  given  by  equation  (5), 
MLri  and  MLt2  are  pre-defined  threshold  values,  m  is  the 
location  in  the  function  where  the  choice  1  condition  is  satisfied, 
and  z  is  the  location  in  the  function  where  the  choice  2  condition 
is  satisfied.  Once  the  condition  is  satisfied  for  the  first  threshold 
value  (MLti)  then  its  corresponding  intensity  value  is  used  to 
produce  the  segmentation  contour  for  the  first  steep  change 
location.  Once  the  condition  is  satisfied  for  MLt2  then  its 
corresponding  intensity  value  is  used  to  produce  the  segmentation 
contour  for  the  second  steep  change  location. 

2.5  Validation 

The  segmentation  method  was  validated  on  the  basis  of  overlap 
and  accuracy  [8,10]: 


Overlap  - 


^  tTJ  ^ hP 


(6) 
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Accuracy  = 


(7) 


where  Nfp  is  the  true  positive  fraction,  true  negative  fraction, 
Npp  is  the  false  positive  fraction,  and  Npi^  is  the  false  negative 
fraction.  The  gold  standards  used  for  the  validation  study  were 
mass  contours,  which  have  been  traced  by  expert  radiologists. 

Our  experiments  produced  contours  for  the  intensity  values 
resulting  from  three  locations  within  the  likelihood  functions:  (1) 
The  intensity  for  which  a  value  within  the  likelihood  function  is 
maximum  (group  1  contour)  (2)  The  intensity  for  which  the 
likelihood  function  experiences  its  first  steep  change  (group  2 
contour)  and  (3)  The  intensity  for  which  the  likelihood  function 
experiences  its  second  steep  change  (group  3  contour).  We  have 
observed  that  the  intensity  for  which  the  likelihood  function 
experiences  its  first  steep  change  produces  the  contour  trace  that 
is  most  highly  correlated  with  the  gold  standard  traces,  regarding 
overlap  and  accuracy. 


3.  EXPERIMENTS  AND  RESULTS 

Here  we  describe  the  database  used,  describe  the  experiments, 
provide  visual  results  obtained  by  the  algorithm,  as  well  as  report 
the  results  obtained  by  the  ANOVA  test. 

3.1  Database 

For  this  study,  a  total  of  122  masses  were  chosen  from  the 
University  of  South  Florida’s  Digital  Database  for  Screening 
Mammography  (DDSM)  [3].  The  films  were  digitized  at 
resolutions  of  43.5  or  50  |im's  using  either  the  Howtek  or 
Lumisys  digitizers,  respectively.  The  DDSM  cases  have  been 
ranked  by  expert  radiologists  on  a  scale  from  1  to  5,  where  1 
represents  the  most  subtle  masses  and  5  represents  the  most 
obvious  masses.  The  images  were  of  varying  subtlety  ratings. 
The  first  set  of  expert  traces  was  provided  by  an  attending 
physician  of  the  GUMC,  and  is  hereafter  referred  to  as  the  Expert 
A  traces.  The  second  set  of  expert  traces  was  provided  by  the 
DDSM,  and  is  hereafter  referred  to  as  the  Expert  B  traces. 

3.2  Experiments  and  Results 

As  mentioned  previously,  the  term  “steep  change”  is  very 
subjective  and  therefore  a  set  of  thresholds  needed  to  be  set  in  an 
effort  to  define  a  particular  location  within  the  likelihood  function 
as  a  “steep  change  location”.  For  this  study  the  following 
thresholds  were  experimentally  chosen:  MLti=1800, 
MLx2=1300,  where  MLti=  threshold  for  steep  change  location  1 
for  the  likelihood  function,  and  MLx2  =  threshold  for  steep 
change  location  2  for  the  likelihood  function.  We  performed  a 
number  of  experiments  in  an  effort  to  prove  that  the  intensity  for 
which  the  likelihood  function  experiences  the  first  steep  change 
location  produces  the  contour  trace,  which  is  most  highly 
correlated  with  the  gold  standard  traces  regarding  overlap  and 
accuracy. 

First  we  present  segmentation  results  for  two  malignant  cases 
followed  segmentation  results  for  two  benign  cases.  Each  figure 
contains  an  original  image,  traces  for  Experts  A  and  B,  and 
computer  segmentation  results  for  groups  1,  2,  and  3.  Second, 
we  present  data  that  plots  the  mean  values  for  various  margin 
groups  for  both  overlap  and  accuracy  measurements.  The  plots 


present  data  for  the  spiculated  and  ill-defined  groups  of  malignant 
masses,  and  ill-defined  and  circumscribed  groups  of  benign 
masses.  Data  was  not  presented  for  the  other  categories  because 
there  was  not  a  sufficient  amount  of  cases. 
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Figure  2:  Segmentation  Results:  Spiculated  Malignant  Mass 
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Figure  3:  Segmentation  Results:  Ill-defined  Malignant  Mass 
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Figure  4:  Segmentation  Results:  Obscured  Malignant  Mass 


Benign  Mass  with 

Original 

ROI 

Expert  A 
Trace 

Expert  B 
Trace 

ill-defined  margins 
(subtlety  =  3) 

Group  1 
Result 

Group  2 
Result 

Group  3 
Result 

Figure  5:  Segmentation  Results:  Ill-defined  Benign  Mass 


115 


Figure  6:  Segmentation  Results:  Circumscribed  Benign  Mass 
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Figure  7:  Mean  Measurement  Values  (Malignant  Masses) 
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groups  on  the  basis  of  overlap  and  accuracy  for  all  margin 
groups,  therefore  supporting  our  visual  observations. 

In  fiiture  work,  a  worthwhile  study  would  be  to  test  gather  more 
data  for  all  margin  groups  in  an  effort  to  see  if  the  various  groups 
require  different  parameter  values  to  maximize  the  algorithm’s 
robustness.  Our  ultimate  goal  is  to  optimize  its  performance  for 
those  masses  falling  in  the  ill-defined  and  obscured  margin  groups 
because  segmentation  of  masses  falling  into  those  categories  is 
exceedingly  difficult. 
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Abstract 


The  purpose  of  this  work  was  to  develop  an  automatic  boundary  detection  method  for 
mammographic  masses  and  to  rigorously  test  this  method  via  statistical  analysis.  The 
segmentation  method  utilized  a  steepest  change  analysis  technique  for  the  determination  of  the 
mass  boundaries  based  on  a  composed  probability  density  cost  function.  Previous  investigators 
have  shown  that  this  function  can  be  utilized  to  determine  the  border  of  mass  body.  We  have 
further  analyzed  this  method  and  have  discovered  that  the  steepest  changes  in  this  function  can 
produce  mass  delineations  to  include  extended  projections.  The  method  was  tested  on  124 
digitized  mammograms  selected  from  the  University  of  South  Florida’s  Digital  Database  for 
Screening  Mammography  (DDSM).  The  segmentation  results  were  validated  using  overlap, 
accuracy,  sensitivity,  and  specificity  statistics,  where  the  gold  standards  were  manual  traces 
provided  by  two  expert  radiologists.  We  have  concluded  that  the  best  intensity  threshold 
corresponds  to  a  particular  steepest  change  location  within  the  composed  probability  density 
function.  We  also  found  that  our  results  are  more  closely  correlated  with  one  expert  than  with 
the  second  expert.  These  findings  were  verified  via  Analysis  of  Variance  (ANOVA)  testing. 
The  ANOVA  tests  obtained  p-values  ranging  from  1.03x10'^  -  7.51x10''^  for  the  single  observer 
studies,  2.03x10'^  -  9.43x10  '^  for  the  two  observer  studies,  and  results  were  categorized  using 
three  significance  levels,  i.e.,  p  <  0.001  (extremely  significant),  p  <  0.01  (very  significant),  and  p 
<  0.05  (significant),  respectively  . 

Index  Terms:  mass  boundary  detection,  mammography,  probability-based  cost  function 
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I.  INTRODUCTION 


In  the  United  States,  breast  cancer  accounts  for  one-third  of  all  cancer  diagnoses  among 
women  and  it  has  the  second  highest  mortality  rate  of  all  cancer  deaths  in  women*.  In  several 
studies  it  has  been  shown  that  only  13%  -  29%  of  suspicious  masses  were  determined  to  be 
malignant^"^,  which  indicates  that  there  are  high  false  positive  rates  for  biopsied  breast  masses. 

A  higher  predictive  rate  is  anticipated  by  combining  the  mammographer's  interpretation  and  the 
computer  analysis.  Other  studies  have  shown  that  7.6%  -  14%  of  the  patients  have 
mammograms  that  produce  false  negative  diagnoses*"^.  Alternatively,  a  Computer  Assisted 
Diagnosis  (CADx)  system  can  serve  as  a  clinical  tool  for  the  radiologist  and  consequently  lower 
the  rate  of  missed  breast  cancer. 

Generally,  CADx  systems  consist  of  three  major  stages,  namely,  segmentation,  feature 
calculation,  and  classification.  Segmentation  is  arguably  one  of  the  most  important  aspects  of 
CADx  -  particularly  for  masses  -  because  a  strong  diagnostic  predictor  for  masses  is  shape. 
Specifically,  many  malignant  masses  have  ill-defined,  and/or  spiculated  borders  and  many 
benign  masses  have  well-defined,  rounded  borders.  Furthermore,  breast  masses  can  have 
unclear  borders  and  are  sometimes  obscured  by  glandular  tissue  in  mammograms.  During  the 
search  for  suspicious  areas  it  is  possible  that  masses  of  this  type  are  overlooked  by  radiologists. 
When  a  specific  area  is  deemed  to  be  suspicious,  the  radiologist  analyzes  the  overall  mass, 
including  its  shape  and  margin  characteristics.  The  margin  of  a  mass  is  defined  as  the  interface 
between  the  mass  and  surrounding  tissue,  and  is  regarded  by  some  as  one  of  the  most  important 
factors  in  determining  its  significance’.  Specifically,  a  spiculated  mass  consists  of  a  central 
mass  body  surrounded  by  fibrous  extensions,  hence  the  resulting  stellate  shape.  In  this  context, 
“extension”  refers  to  those  portions  of  the  mass  containing  ill-defined  borders,  spiculations, 
fibrous  borders,  and  projections.  Although  the  diameters  of  these  cancers  are  measured  across 
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the  central  portion  of  the  mass,  microscopic  analysis  of  the  extensions  also  reveals  associated 
cancer  cells,  i.e.,  the  extended  projections  may  contain  active  mass  growth  ’  .  In  addition,  the 
features  of  the  extended  projections  and  ill-defined  borders  are  highly  useful  for  identifying 
masses.  Hence,  proper  segmentation  -  to  include  the  body  and  periphery  -  is  extremely 
important  and  is  essential  for  the  computer  to  analyze,  and  in  turn,  determine  the  malignancy  of 
the  mass  in  mammographic  CAD^  systems. 

Te  Brake  and  Karssemeijer^  implemented  a  discrete  dynamic  contour  model,  a  method 
similar  to  snakes,  which  begins  as  a  set  of  vertices  connected  by  edges  (initial  contour)  and 
grows  subject  to  internal  and  external  forces.  Li’®  developed  a  method  that  employs  k-means 
classification  to  classify  pixels  as  belonging  to  the  region  of  interest  (ROI)  or  background. 

Petrick  et  al.”  developed  the  Density  Weighted  Contrast  Enhancement  (DWCE)  method,  in 
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which  series  of  filters  are  applied  to  the  image  in  an  attempt  to  extract  masses.  Pohlman  et  al. 
developed  an  adaptive  region  growing  method  whose  similarity  criterion  is  determined  from 
calculations  made  in  5x5  windows  surrounding  the  pixel  of  interest.  Mendez  et  al.  developed 
a  method,  which  combined  bilateral  image  subtraction  and  region  growing  to  segment  masses. 

Several  studies  have  also  used  probability-based  analysis  to  segment  digitized 
mammograms.  Li  et  al.'"’  developed  a  segmentation  method  that  first  models  the  histogram  of 
mammograms  using  a  finite  generalized  Gaussian  mixture  (FGGM)  and  then  uses  a  contextual 
Bayesian  relaxation  labeling  (CBRL)  technique  to  find  suspected  masses.  Furthermore,  this 
method  uses  the  Expectation-Maximization  (EM)  technique  in  developing  the  FGGM  model. 
Comer  et  al.'^  utilized  an  EM  technique  to  segment  digitized  mammograms  into  homogeneous 
texture  regions  by  assigning  each  pixel  was  to  one  of  a  set  of  classes  such  that  the  number 
incorrectly  classified  pixels  was  minimized.  Kupinski  and  Giger'®  developed  a  method,  which 
combines  region  growing  with  probability  analysis  to  determine  final  segmentation.  In  their 
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method,  the  probability-based  function  is  formed  from  a  specific  composed  probability  density 
function,  determined  by  a  set  of  image  contours  produced  by  the  region  growing  method.  This 
method  is  a  highly  effective  one  and  it  was  implemented  by  Te  Brake  and  Karssemeijer  in  their 
work^  that  compared  the  results  of  a  model  of  the  discrete  dynamic  contour  model  with  those  of 
the  probability-based  method.  For  this  reason  we  chose  to  investigate  its  use  as  a  possible 
starting  point  from  which  a  second  method  could  be  developed.  Consequently  for  our 
implementation  of  this  work  we  discovered  an  important  result,  i.e.,  the  steepest  changes  of  a 
cost  function  composed  from  two  probability  density  functions  of  the  regions.  It  appears  that  in 
many  cases  this  result  produces  contour  choices  that  encapsulate  important  borders  such  as  mass 
spiculations  and  ill-defined  borders. 

Several  CAD^  classification  techniques  have  been  developed.  They  are  described  here 
to  underscore  the  importance  of  accurate  segmentation  in  CADx  studies.  Lo  et  al.  has 
developed  an  effective  analysis  method  using  the  circular  path  neural  network  technique  that  was 
specifically  designed  to  classify  the  segmented  objects  and  can  certainly  be  extended  for  the 
applications  related  to  mass  classification.  Polakowski  et  al.^*  used  a  multilayer  perceptron 
(MLP)  neural  network  to  distinguish  malignant  and  benign  masses.  Both  Sahiner  et  al.'^  and 
Rangayyan  et  al.^”  used  linear  discriminant  analysis  to  distinguish  benign  masses  from  malignant 
masses.  While  many  CADx  systems  have  been  developed,  the  development  of  fully-automated 
image  segmentation  algorithms  for  breast  masses  has  proven  to  be  a  daunting  task. 

IL  METHODS 

A.  Segmentation  method  -  Maximum  change  of  cost  function  as  a  continuation  of 
probability-based  function  analysis 
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As  a  point  of  clarification,  in  this  work  we  refer  to  the  function  used  to  find  optimal 
region  growing  contours  in  Kupinski  and  Giger’s  study'^  as  the  probability-based  function  and 
we  refer  our  function  as  the  cost  function.  The  two  functions  are  similar;  however  they  differ  in 
terms  of  the  images  used  in  their  formation.  As  an  initial  segmentation  step,  region  growing  is 
used  to  aggregate  the  area  of  interest'^’  where  grayscale  intensity  is  the  similarity  criterion. 

This  phase  of  the  algorithm  starts  with  seed  point  whose  intensity  is  high,  and  nearby  pixels  with 
values  greater  than  or  equal  to  this  value  are  included  in  the  region  of  interest.  As  the  intensity 
threshold  decreases,  the  region  increases  in  size,  therefore  there  is  an  inverse  relationship 
between  intensity  value  and  contour  size.  In  many  cases  the  region  growing  method  is 
extremely  effective  in  producing  contours  that  are  excellent  delineations  of  mammographic 
masses.  However,  the  computer  is  not  able  to  choose  the  contour  that  is  most  highly  correlated 
with  the  experts’  delineations,  specifically,  those  masses  that  contain  ill-defined  margins  or 
margins  that  extend  into  surrounding  fibroglandular  tissue.  Furthermore,  the  task  of  asking  a 
radiologist  to  visually  choose  the  best  contour  would  be  both  time  intensive  and  extremely 
subjective  from  one  radiologist  to  another. 

The  segmentation  technique  described  in  this  work  attempts  to  solve  and  automate  this 
process  by  adding  a  two-dimensional  (2D)  shadow  and  probability-based  components  to  the 
segmentation  algorithm.  Furthermore,  we  have  devised  a  steepest  descent  change  analysis 
method  that  chooses  the  best  contour  that  delineates  the  mass  body  contour  as  well  as  its 
extended  borders,  i.e.,  extensions  into  spiculations  and  areas  in  which  the  borders  are  ill-defined 
or  obscured.  It  has  been  discovered  that  the  probability-based  function  is  capable  of  extracting 
the  central  portion  of  the  mass  density  as  demonstrated  by  the  previous  investigators’^,  and  in 
this  work  the  method  has  been  advanced  further  such  that  it  can  include  the  extensions  of  the 
masses.  The  enhanced  method  can  produce  contours,  which  closely  match  expert  radiologist 
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traces.  Specifically,  it  has  been  observed  that  this  technique  can  select  the  contour  that 
accurately  represents  the  mass  body  contour  for  a  given  set  of  parameters.  However,  further 
analysis  of  the  cost  function  composed  from  the  probability  density  functions  inside  and  outside 
of  a  given  contour  revealed  that  the  computer  could  choose  a  set  of  three  segmentation  contour 
choices  from  the  entire  set  of  contour  choices,  and  later  make  a  final  decision  from  these  three 
choices. 

1.  Region  growing  and  pre-processing 

Initially,  a  512x512  pixel  area  surrounding  the  mass  is  cropped.  The  region  growing 
technique'^’  to  aggregate  the  region  of  interest  was  employed,  where  the  similarity  criterion 
for  our  region  growing  algorithm  is  grayscale  intensity.  To  start  the  growth  of  first  region,  a 
seed  point  was  placed  at  the  center  of  the  512x512  ROI.  The  region  growing  process  continues 
by  decreasing  the  intensity  value  until  we  have  grown  a  sufficiently  large  set  of  contours. 

Next,  the  image  is  multiplied  by  a  2D  trapezoidal  membership  function  with  rounded 
comers  whose  upper  base  measures  40  pixels  and  lower  base  measures  250  pixels  (1  pixel  =  50 
microns).  This  function  was  chosen  because  it  is  a  good  model  of  the  mammographic  mass’s 
intensity  distribution.  Since  the  ROI's  have  been  cropped  such  that  the  mass's  center  was 
located  at  the  center  of  the  512  pixel  x  512  pixel  area,  shadow  multiplication  emphasizes  pixel 
values  at  the  center  of  the  ROI  and  suppresses  background  pixels.  The  image  to  which  the 
shadow  has  been  applied  is  henceforth  referred  to  as  the  "processed"  image.  The  original  image 
and  its  processed  version  were  used  to  compute  the  highest  possibility  of  its  boundaries.  The 
computation  method  is  comprised  of  two  components  for  a  given  boundary:  (1)  formulation  of 
the  composed  probability  as  a  cost  function  and  (2)  evaluation  of  the  cost  function. 
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The  contours  were  grown  using  the  original  image  as  opposed  to  the  processed  image  and 
this  accounts  for  a  major  difference  between  the  current  implementation  and  that  implemented 
by  the  previous  investigators'^.  By  using  contours  generated  from  the  original  image  a  cost 
function  composed  from  the  probability  density  functions  inside  and  outside  of  the  contours  was 
produced.  In  many  situations,  the  greatest  changes  in  contour  shape  and  size  occur  at  sudden 
decreases  within  the  function.  In  analyzing  these  steep  changes  it  was  observed  that  the 
intensity  values  corresponding  to  the  steep  changes  typically  produced  contours  that 
encapsulated  both  the  mass  body  as  well  as  its  spiculated  projections  or  ill-defined  margins. 

This  phenomenon  would  be  suppressed  if  the  processed  image  was  used  to  generate  the  contour. 
A  more  detailed  discussion  of  steep  changes  within  the  cost  function  is  forthcoming  in  section 
II.A.2.3. 

The  processed  image  was  mainly  used  to  construct  the  cost  function.  A  common 
technique  used  in  mass  segmentation  studies  is  to  pre-process  the  images  using  some  type  of 
filtering  mechanism"’'^  in  an  effort  to  separate  the  mass  from  surrounding  fibroglandular  tissue. 
This  method  could  be  particularly  beneficial  to  the  region  growing  process  because  it  would  aid 
in  preventing  the  regions  from  growing  into  surrounding  tissue.  Alternatively,  the  filtering 
process  could  impede  our  goal  of  attempting  to  encapsulate  a  mass’s  extended  borders  as  well  as 
borders  that  are  ill-defined  due  to  the  filtering  process’s  a  tendency  to  create  rounded  edges  on 
margins  that  are  actually  jagged,  i.e.,  spiculated.  This  phenomenon  could  potentially  defeat  the 
goal  of  extracting  mass  borders.  For  these  reasons,  we  have  chosen  to  aggregate  the  contours 
using  the  original  ROI  rather  its  processed  version. 
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2.  Formulation  of  the  composed  probability  as  a  cost  function 

In  the  context  of  this  work,  the  composed  probability  is  defined  as  the  probability  density 
functions  of  the  pixels  inside  and  outside  a  contour  using  a  processed  and  non-processed  version 
of  an  image.  Specifically,  for  a  contour  (5,),  the  composed  probability  (C,)  is  calculated: 

y=o  j=o 

The  quantity y;|x,y)  is  the  set  of  pixels,  which  lie  inside  the  contour  Si  (see  Fig.  la),  and  this  area 
contained  processed  pixel  values.  The  quantity  p(fi(^:,y)\Si)  is  the  probability  density  function  of 
the  pixels  inside  5,  (fi(x,y)),  where  ‘i’  is  the  intensity  threshold  used  to  produce  the  contours 
given  by  the  region  growing  step,  and  ‘h’  is  the  maximum  intensity  value.  The  quantity  mi(x,y) 
is  the  set  of  pixels,  which  lie  outside  the  contour  5,  (see  Fig.  lb),  and  this  area  contained 
non-processed  pixels.  The  quantity  p{mi(x,y)\Si)  is  the  probability  density  function  of  the  pixels 
outside  Si,  where  ‘T  is  the  intensity  threshold  used  to  produce  the  contours  given  by  the  region 
growing  step,  and  ‘h’  is  the  maximum  intensity  value.  For  implementation  purposes,  the 
logarithm  of  the  composed  probability  of  the  two  regions,  C,-  was  used. 

Log  (c,  |5, ) = logf  Y[p(fi  )1 + logf  n  p{^i  y}^i  )1  (2) 

V  J  \  J 
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Intensity=2884 


Original  image  (a)  (b) 


Intensity=2752  Intensity=2539 


(a)  (b)  (a)  (b) 


Fig.  1 :  Four  grown  contours  used  to  construct  the  cost  function  :  starts  from  high  intensity  thresholds  and 
moves  towards  low  intensity  thresholds.  Each  contour  separates  the  ROI  into  two  parts:  (a)  Segmented 
image  (based  on  processed  image)  used  to  compute  density  function  p(fi(x,y)\Si)  and  (b)  Masked  image 
(based  on  non-processed  original  image)  used  to  compute  density  function  p(m,(x,y)|S,)  for  four  intensity 
threshold  values 


3.  The  cost  function  based  on  the  composed  probability  density  functions 

To  select  the  contour  that  represents  the  fibrous  portion  of  the  mass,  it  is  appropriate  to 
examine  the  maximum  value  of  the  cost  function: 

arg  max(Log(c,  |5, },  S. ,  i  =  l,...n)  (3) 

It  has  been  assessed  (also  by  other  investigators’’'^)  that  the  intensity  value  corresponding  to  this 
maximum  value  is  the  optimal  intensity  needed  to  delineate  the  mass  body  contour.  However, 
in  the  current  implementation  it  was  discovered  that  the  intensity  threshold  corresponding  to  the 
maximum  value  confines  the  contour  to  the  fibrous  portion  of  the  mass,  i.e.,  the  mass  body.  In 
the  study  many  of  these  contours  did  not  include  the  extended  borders.  It  is  therefore, 
hypothesized  that  the  contour  represents  the  mass’s  extended  borders  may  well  be  determined  by 
assessing  the  greatest  changes  of  the  cost  function,  i.e.,  locate  the  steepest  value  changes  within 
the  function: 
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(4) 


^(Log(c,.|S,),5,,i  =  l,...n) 

di 

Based  on  this  assumption,  the  cost  functions  associated  with  masses  were  analyzed.  The  analysis 
reveals  that  the  most  likely  boundaries  of  masses  associated  with  expert  radiologist  traces  are 
usually  produced  by  the  intensity  value  corresponding  to  the  first  or  second  steepest  change  of 
value  immediately  following  the  maximum  value  on  the  cost  function  (see  Fig.  2a).  The 
description  of  this  discovery  is  given  below  followed  by  a  validation  study  described  in  section 
II.B  and  results  shown  in  section  III.  The  overarching  goal  of  the  steep  descent  method  is  to 
determine  the  possibility  that  a  certain  contour  is  the  best  contour,  which  represents  the  mass  and 
its  extended  borders. 

3.  The  definition  of  steepest  change 

The  term  "steepest  change"  is  rather  subjective  and  in  the  context  of  this  work  can  be 
defined  as  a  location  between  two  or  more  points  in  the  cost  function  where  the  values 
experience  a  significant  change.  When  the  values  are  plotted  as  a  function  of  intensity,  these 
significant  changes  are  often  visible  in  the  function.  In  some  cases  the  cost  function  increases 
at  a  slow  rate,  therefore  a  potential  steepest  change  location  could  be  missed.  The  algorithm 
design  compensates  for  this  issue  by  calculating  the  difference  between  values  in  steps  over 
several  values  and  comparing  the  results  to  two  threshold  values.  The  difference  equation  is 
given  by: 

d{t)=  f{z-wt)- f{z-w{t  +  l)),  r  =  0,...,m  (5) 

where/is  the  cost  function  ,  z  is  the  maximum  intensity,  w  is  the  width  of  the  interval  over  which 
the  cost  function  differences  are  calculated  (e.g.  -  for  w=5  differences  are  calculated  every  5 
points),  and  m  is  the  total  number  of  points  in  the  searchable  area  divided  by  w.  Note  that  “wt” 
is  associated  with  a  specific  contour  “i”  described  earlier.  If  the  value  of  d(t)  yields  a  value 
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greater  than  or  equal  to  a  given  threshold,  then  the  intensity  corresponding  to  this  location  is 
determined  to  be  a  steepest  change  location.  The  threshold  algorithm  occurs  as  follows: 
If(d(t)>TV,);  t=0,...,m 

Then  choice  1  =  intensity  where  that  condition  is  satisfied. 

If(d(t)>TV2);  t=m,...,z 

Then  choice  2  =  intensity  where  that  condition  is  satisfied, 
where  TVi  and  TV2  are  pre-defined  threshold  values,  m  is  the  location  in  the  function  where  the 
choice  1  condition  is  satisfied,  and  z  is  the  location  in  the  function  where  the  choice  2  condition 
is  satisfied.  During  the  examination  of  the  contour  growth  with  respect  to  the  cost  function  ,  the 
first  steepest  change  (i.e.,  d(t)Mci  as  choice  1)  is  determined  by  TV  1  immediately  after  the 
location  of  the  maximum  cost  function  value  (corresponding  to  mass  body  discussed  earlier). 

The  second  the  steepest  change  (i.e.,  d(t)MC2  as  choice  2)  is  determined  by  TV2  after  the  first 
steepest  change  has  been  established. 

As  an  example  Fig.  la  is  used  to  illustrate  how  the  algorithm  is  carried  out.  In  this  figure, 
the  maximum  value  on  the  cost  function  occurs  for  a  grayscale  intensity  value  of  approximately 
3330.  The  searching  process  begins  from  this  maximum  point  and  it  is  discovered  that  the  first 
steepest  change  (d(t)Mci  as  choice  1)  occurs  for  a  grayscale  intensity  value  approximately  equal 
to  3200.  From  this  point  the  continue  the  searching  process  continues  and  it  is  discovered  that 
the  second  steepest  change  (d(t)MC2  as  choice  2)  occurs  for  a  grayscale  intensity  value 
approximately  equal  to  3175.  In  summary,  intensity  values  of  3330,  3200,  and  3175  can  be 
used  to  grow  3  potential  mass  delineation  candidates,  and  the  large  set  of  intensity  choices  has 
been  narrowed  to  3  choices.  In  many  cases  intensities,  which  produced  the  three  contour 
choices  gave  the  following  results: 
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(1)  Intensity  corresponding  to  the  maximum  value  on  the  cost  function;  The  central  body  of  the 
mass  was  encapsulated 

(2)  Intensity  corresponding  to  the  first  steepest  change  on  the  cost  function:  The  central  body 
of  the  mass  +  some  of  its  extended  borders  (i.e.,  projections  and  spiculations)  was 
encapsulated 

(3)  Intensity  corresponding  to  the  second  steepest  change  on  the  cost  function:  The  central 
body  of  the  mass  +  more  extended  borders  +  surrounding  fibroglandular  tissue  encapsulated 

The  intensity  corresponding  to  the  first  steepest  change  provides  the  best  choice,  and  an 
examination  of  this  observation  is  shown  and  discussed  in  sections  III  and  IV  of  this  work. 

As  stated  previously  the  steep  changes  within  the  cost  function  would  be  suppressed  if  the 
processed  image  was  used  to  generate  the  contour,  therefore  the  function  would  be  relatively 
smooth.  This  issue  is  evident  in  Fig.  2b,  which  shows  a  probability-based  function  produced  by 
contours  that  were  grown  using  a  processed  ROI. 
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(a) 


(b) 

Fig.  2:  (a)  Example  of  cost  function  with  steepest  change  location  indicators  (b)  Example  of  a 

probability-based  function  without  an  obvious  steepest  change  location. 


14 


B.  Validation  method 


In  several  segmentation  studies  the  results  were  validated  using  the  overlap  statistic  alone, 
however,  it  was  necessary  to  analyze  the  performance  of  the  steepest  change  algorithm  on  the 
basis  of  four  statistics  to  verify  that  the  algorithm  is  indeed  capable  of  categorizing  mass  and 
background  pixels  correctly.  This  type  of  analysis  provides  helpful  information  regarding 
necessary  changes  for  the  algorithm’s  design  and  can  possibly  aid  in  its  optimization. 

The  segmentation  method  was  validated  on  the  basis  of  overlap,  accuracy,  sensitivity, 
and  specificity^^’ These  statistics  are  calculated  as  follows: 


Overlap  = 

E{}P 

(6) 

Accuracy  = 

M  +7V  N  N 

iVj.p  T  T  iVpp  T  iVpyy 

(V) 

N 

Sensitivity  = - — - 

N  +  N 

lyjp  -r 

(8) 

Npisj 

Specificity  = 

Njfj  +  Npp 

(9) 

where  E  is  the  drawing  produced  by  the  expert  radiologist,  P  is  the  segmentation  result,  Ntp  is 
the  true  positive  fraction  (part  of  the  image  correctly  classified  as  mass),  Njn  true  negative 
fraction  (part  of  the  image  correctly  classified  as  surrounding  tissue),  Nfp  is  the  false  positive 
fraction  (part  of  the  image  incorrectly  classified  as  mass),  and  Nfn  is  the  false  negative  fraction 
(part  of  the  image  incorrectly  classified  as  surrounding  tissue).  This  method  requires  a  gold 
standard,  or,  contour  to  which  the  segmentation  results  can  be  compared.  The  gold  standards 
for  the  experiments  performed  in  this  work  were  mass  contours,  which  have  been  traced  by 
expert  radiologists. 
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The  experiments  produced  contours  for  the  intensity  values  resulting  from  three  locations 
within  the  cost  functions  :  (1)  The  intensity  for  which  a  value  within  the  cost  function  is 

maximum  (2)  The  intensity  for  which  the  cost  function  experiences  its  first  steepest  change  and 
(3)  The  intensity  for  which  the  cost  function  experiences  its  second  steepest  change  .  It  has 
been  observed  that  the  intensity  for  which  the  cost  function  experiences  its  first  steepest  change 
produces  the  contour  trace  that  is  most  highly  correlated  with  the  gold  standard  traces,  regarding 
overlap  and  accuracy.  In  cases  for  which  better  results  occur  at  the  second  steepest  change 
location,  there  is  no  significant  difference  between  these  results  and  the  results  calculated  for  the 
first  steepest  change  location.  Second,  it  has  been  observed  that  the  results  are  more  closely 
correlated  with  one  expert  than  with  the  second  expert.  These  hypotheses  were  tested  using  the 
one-way  Analysis  of  Variance  (ANOVA)  test^'^’^^.  In  this  study,  three  significance  levels  (i.e.,  p 
<  0.001,  p  <  0.01,  and  p<  0.05)  were  used  to  categorize  the  ANOVA  results  as  described  in  the 
next  section. 

III.  EXPERIMENTS  AND  RESULTS 

The  following  sections  describe  the  database  and  experiments  as  well  as  provide  results 
and  ANOVA  test  results. 

A.  Database 

For  this  study,  a  total  of  124  masses  were  chosen  from  the  University  of  South  Florida's  Digital 
Database  for  Screening  Mammography  (DDSM)^*^.  The  DDSM  films  were  digitized  at  43.5  or 
50  pm's  using  either  the  Howtek  or  Lumisys  digitizers,  respectively.  The  DDSM  cases  have 
been  ranked  by  expert  radiologists  on  a  scale  from  1  to  5,  where  1  represents  the  most  subtle 
masses  and  5  represents  the  most  obvious  masses.  Table  1  lists  the  distribution  of  the  masses 
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studied  according  to  their  subtlety  ratings.  The  images  were  of  varying  contrasts  and  the 
masses  were  of  varying  sizes. 

Table  1 :  Distribution  of  DDSM  masses  studied  according  to  their  subtlety  ratings 


Subtlety  Category _ Cancer  Benign 


Number  of  masses  with  a  rating  =  1 

5 

3 

Number  of  masses  with  a  rating  =  2 

12 

12 

Number  of  masses  with  a  rating  =  3 

18 

17 

Number  of  masses  with  a  rating  =  4 

9 

23 

Number  of  masses  with  a  rating  =  5 

15 

10 

The  first  set  of  expert  traces  was  provided  by  an  attending  physician  of  the  GUMC,  and  is 
hereafter  referred  to  as  the  Expert  A  traces.  The  second  set  of  expert  traces  was  provided  by  the 
DDSM,  and  is  hereafter  referred  to  as  the  Expert  B  traces. 

B.  Experiments 

As  mentioned  previously,  the  term  “steepest  change”  is  very  subjective  and  therefore  a  set  of 
thresholds  needed  to  be  set  in  an  effort  to  define  a  particular  location  within  the  cost  function 
as  a  “steepest  change  location”.  For  this  study  the  following  thresholds  were  experimentally 
chosen;  TVi=1800,  TV2=1300,  where  TVi=  threshold  for  steepest  change  location  1  for  the  cost 
function,  and  TV2  =  threshold  for  steepest  change  location  2  for  the  cost  function  .  A  number 
of  experiments  were  performed  in  an  effort  to  prove  that  (1)  the  intensity  for  which  the  cost 
function  experiences  the  first  steepest  change  location  produces  the  contour  trace,  which  is 
most  highly  correlated  with  the  gold  standard  traces  with  regard  to  overlap  and  accuracy.  In 
cases  for  which  the  second  steepest  change  location  achieves  better  results,  there  are  no 
significant  differences  between  the  values  obtained  from  the  first  steepest  change  location  and 
the  second  steepest  change  location.  The  experiments  linked  with  these  hypotheses  comprise 
the  studies  for  a  single  observer.  We  have  also  set  out  to  prove  that  (2)  our  results  are  more 
closely  correlated  with  one  expert  than  with  the  second  expert.  The  experiments  linked  with 
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this  hypothesis  comprise  the  studies  between  two  observers.  First  segmentation  results  for  two 
malignant  cases  are  presented,  followed  by  segmentation  results  for  two  benign  cases.  Second, 
the  ANOVA  results  for  a  set  of  hypotheses  are  presented.  The  contours  produced  by  the 
maximum  value  as  well  as  the  steepest  change  locations  within  the  cost  functions  are  labeled  as 
follows: 

(1)  group  1:  The  intensity  for  which  a  value  within  the  cost  function  is  maximum 

(2)  group  2:  The  intensity  for  which  the  cost  function  experiences  its  first  steepest  change 

(3)  group  3:  The  intensity  for  which  the  cost  function  experiences  its  second  steepest  change. 

C.  Results 

Figures  3-6  display  the  results  for  two  malignant  cases  accompanied  by  their  cost 
functions  and  results  for  two  benign  cases  accompanied  by  their  cost  functions  .  The 
ANOVA  results  appear  in  a  set  of  tables  (sections  2-4),  where  each  table  lists  the  hypothesis 
tested  along  with  p-values  and  their  corresponding  categorizations.  The  p-values  are 
categorized  in  the  following  way:  not  significant  (NS  for  p  >  0.05),  significant  (S  for  p  <  0.05), 
very  significant  (VS  for  p<0.01),  and  extremely  significant  (ES  for  p  <  0.001).  Each  p-value 
table  is  followed  by  a  second  table,  which  contains  the  mean  values  of  overlap,  accuracy, 
sensitivity,  and  specificity  for  each  group.  Sections  2  and  3  are  identical  regarding  the 
experiments,  however,  the  pathologies  of  the  masses  are  different  (section  2  -  malignant  masses, 
section  3  -  benign  masses).  Although  the  experiments  are  identical  they  have  been  separated 
for  clarity  purposes. 

A  larger  set  of  segmentation  results  has  been  placed  in  an  image  gallery  containing  7 
malignant  mass  results  (Fig.  Al)  and  7  benign  mass  results  (Fig.  A2).  These  figures  are  located 
in  the  Appendix. 
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Fig.  3  -  (a)  Segmentation  results  for  a  malignant  mass  with  spiculated  margins  (subtlety  =  2) 

(b)  the  corresponding  cost  function 
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Fig.  4  -  (a)  Segmentation  results  for  a  malignant  mass  with  ilhdefined  margins  (subtlety  =  3) 

(b)  the  corresponding  cost  function 
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2.  ANOVA  test  results  for  comparison  of  contour  groups  with  single  observer:  malignant  cases 
Table  2:  Single  observer  results  (expert  A  gold  standard,  malignant  masses) 


ANOVA  Test 

P-value 

P-value 

P-value 

(group  1  vs. 
group  2) 

(group  2  vs. 
groups) 

(group  1  vs. 
group  3) 

Difference  between  groups  (overlap) 
Difference  between  groups  (accuracy) 
Difference  between  groups  (sensitivity) 
Difference  between  groups  (specificity) 

1.78x10-^  (ES) 
NS 

1.88x10'’ (ES) 
5.12x10-^  (ES) 

2.91x10'"  (S) 
3.14x10'"  (S) 
NS 

2.40xl0'\VS) 

NS 

NS  ‘ 

1.85x10''"  (ES) 
2.71x10'’  (ES) 

Table  3:  Mean  values  for  overlap,  accuracy,  sensitivity,  and  specificity 
(expert  A  gold  standard,  malignant  masses) 


Measurement 

Mean  Value 
(group  1) 

Mean  Value 
(group  2) 

Mean  Value 
(group  3) 

Overlap 

0.47 

0.60 

Accuracy 

0.88 

0.90 

0.87 

Sensitivity 

0.49 

0.75 

0.81 

Specificity 

0.99 

0.94 

0.88 

Table  4:  Single  observer  results  (expert  B  gold  standard,  malignant  masses) 


ANOVA  Test 

P-value 

P-value 

P-value 

(group  1  vs. 
group  2) 

(group  2  vs. 
group  3) 

(group  1  vs. 
group  3) 

Difference  between  groups  (overlap) 
Difference  between  groups  (accuracy) 
Difference  between  groups  (sensitivity) 
Difference  between  groups  (specificity) 

3.96x10''’  (ES) 
NS 

4.88x10'*  (ES) 
2.70x10"'  (ES) 

NS 

NS 

4.31x10'"  (S) 
4.36x10'^  (ES) 

1.58x10'^ 

NS 

4.25x10''"  (ES) 
1.44x10'"  (ES) 

Table  5:  Mean  values  for  overlap,  accuracy,  sensitivity,  and  specificity 
(expert  B  gold  standard,  malignant  masses) 


Measurement 

Mean  Value 
(group  1) 

Mean  Value 
(group  2) 

Mean  Value 
(group  3) 

Overlap 

0.38 

0.54 

0.51 

Accuracy 

0.83 

0.86 

0.84 

Sensitivity 

0.38 

0.56 

0.60 

Specificity 

1.00 

0.98 

0.94 
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3.  ANOVA  test  results  for  comparison  of  contour  groups  with  single  observer:  benign  cases 


Table  6:  Single  observer  results  (expert  A  gold  standard,  benign  masses) 


ANOVA  Test 

P-value 

P-value 

P-value 

(group  1  vs. 
group  2) 

(group  2  vs. 
group  3) 

(group  1  vs. 

group  3) 

Difference  between  groups  (overlap) 
Difference  between  groups  (accuracy) 
Difference  between  groups  (sensitivity) 
Difference  between  groups  (specificity) 

3.19x10"^  (ES) 
NS 

1.14x10"’ (ES) 
8.93x10'^  (VS) 

8.38x10-^  (ES) 
4.73x10-^  (VS) 
1.89x10'^  (S) 
1.24x10'^  (VS) 

NS 

2.51x10-^  (VS) 
7.51xlO"''^(ES) 
3.32x10'*’  (ES) 

Table  7:  Mean  values  for  overlap,  accuracy,  sensitivity,  and  specificity 
(expert  A  gold  standard,  benign  masses) 


Measurement 

Mean  Value 
(group  1) 

Mean  Value 
(group  2) 

Mean  Value 
(group  3) 

Overlap 

0.46 

0.58 

0.45 

Accuracy 

0.90 

0.91 

0.85 

Sensitivity 

0.49 

0.73 

0.82 

Specificity 

0.99 

0.94 

0.86 

Table  8:  Single  observer  results  (expert  B  gold  standard,  benign  masses) 


ANOVA  Test 

P-value 

P-value 

P-value 

(group  1  vs. 

(group  2  vs. 

(group  1  vs. 

group  2) 

group  3) 

group  3) 

Difference  between  groups  (overlap) 
Difference  between  groups  (accuracy) 
Difference  between  groups  (sensitivity) 
Difference  between  groups  (specificity) 

8.82x10'^  (ES) 
NS 

1.61x10'’ (ES) 
1.18x10'’ (S) 

NS 

2.62x10'’  (S) 
NS 

1.27x10'’  (S) 

1.62x10'^  (S) 
2.48x10  ’ (S) 
3.14x10'*’ (ES) 
1.25x10'’  (ES) 

Table  9:  Mean  values  for  overlap,  accuracy,  sensitivity,  and  specificity 
(expert  B  gold  standard,  benign  masses) 


Measurement 

Mean  Value 
(group  1) 

Mean  Value 
(group  2) 

Mean  Value 
(group  3) 

Overlap 

0.36 

0.51 

0.44 

Accuracy 

0.88 

0.89 

0.83 

Sensitivity 

0.36 

0.61 

0.69 

Specificity 

0.99 

0.94 

0.86 
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4.  ANOVA  test  results  for  comparison  of  contour  groups  between  two  observers 


Table  10;  Two  observer  results:  expert  A  vs.  expert  B,  malignant  masses 


ANOVA  Test 

P-value 
(group  1  vs. 
group  2) 

P-value 
(group  2  vs. 
group  3) 

P-value 
(group  1  vs. 
group  3) 

Expert  A  vs.  Expert  B  (overlap) 

3.12x10'^  (VS) 

3.32x10'^  (S) 

NS 

Expert  A  vs.  Expert  B  (accuracy) 

1.20x10'^  (S) 

4.46x10'^  (S) 

NS 

Expert  A  vs.  Expert  B  (sensitivity) 

9.43x10  '^  (ES) 

3.38x10-^  (ES) 

3.67x10-^  (ES) 

Expert  A  vs.  Expert  B  (specificity) 

NS 

NS 

NS 

Table  11:  Mean  values  for  overlap,  accuracy,  sensitivity,  and  specificity 
(expert  A  vs.  expert  B,  malignant  masses) 


Measurement 

Mean 
Value, 
Expert  A 
(group  1) 

Mean 
Value, 
Expert  B 
(group  1) 

Mean 
Value, 
Expert  A 
(group  2) 

Mean 
Value, 
Expert  B 
(group  2) 

Mean 
Value, 
Expert  A 
(group  3) 

Mean 
Value, 
Expert  B 
(group  3) 

Overlap 

0.49 

0.38 

0.62 

0.55 

0.55 

0.51 

Accuracy 

0.89 

0.83 

0.91 

0.87 

0.87 

0.84 

Sensitivity 

0.52 

0.38 

0.75 

0.60 

0.82 

0.68 

Specificity 

0.99 

1.00 

0.95 

0.97 

0.89 

0.91 

Table  12:  Two  observer  results:  expert  A  vs.  expert  B,  benign  masses 


ANOVA  Test 

P-value 

P-value 

P-value 

(group  1  vs. 

(group  2  vs. 

(group  1  vs. 

group  2) 

group  3) 

group  3) 

Expert  A  vs.  Expert  B  (overlap) 
Expert  A  vs.  Expert  B  (accuracy) 
Expert  A  vs.  Expert  B  (sensitivity) 
Expert  A  vs.  Expert  B  (specificity) 

NS 

NS 

3.56x10'^  (S) 
NS 

NS 

NS 

4.90x10'^  (S) 
NS 

NS 

NS 

2.03x10'^  (S) 
NS 

Table  13:  Mean  values  for  overlap,  accuracy,  sensitivity,  and  specificity: 
expert  A  vs.  expert  B,  benign  masses 


Measurement 

Mean 
Value, 
Expert  A 
(group  1) 

Mean 
Value, 
Expert  B 
(group  1) 

Meau 
Value, 
Expert  A 
(group  2) 

Mean 
Value, 
Expert  B 
(group  2) 

Mean 
Value, 
Expert  A 
(group  3) 

Mean 
Value, 
Expert  B 
(group  3) 

Overlap 

0.42 

0.35 

0.57 

0.50 

0.48 

0.44 

Accuracy 

0.90 

0.88 

0.91 

0.89 

0.85 

0.83 

Sensitivity 

0.44 

0.36 

0.71 

0.61 

0.79 

0.69 

Specificity 

0.99 

0.99 

0.94 

0.94 

0.86 

0.86 

25 


IV.  DISCUSSION 


A.  Segmentation  Results 

From  the  ROI’s  shown  in  Figures  3  and  4  it  is  evident  that  the  intensity  produced  by  the 
maximum  value  is  capable  of  accurately  delineating  the  mass  body  contour,  and  in  some  cases 
this  intensity  corresponding  to  the  maximum  value  produces  a  contour,  which  falls  inside  the 
mass  body  contour.  This  can  be  potentially  problematic  because  low  segmentation  sensitivities 
can  produce  large  errors  during  the  feature  calculation  and  diagnosis  phases  of  CADx.  Of  the 
three  available  segmentation  choices  for  each  mass,  it  appears  that  the  first  steepest  change 
location  produces  the  contours  with  strongest  correlation  in  comparison  to  both  gold  standards. 
These  contours  appear  to  cover  both  the  mass  body  contour  as  well  as  the  extended  borders.  In 
some  instances  the  region  grows  into  some  areas  that  are  not  declared  as  mass  areas  by  the  gold 
standards  -  we  call  this  flooding  -  and  fails  to  grow  into  other  areas  that  have  been  declared  as 
mass  areas.  Finally,  the  second  steepest  change  location  produces  contours  that  also  cover  both 
the  mass  body  contour  as  well  as  the  extended  borders,  and,  the  contours  tend  to  also  include 
surrounding  fibroglandular  tissue;  hence,  the  flooding  phenomenon  is  a  common  occurrence. 

In  the  cases  shown,  it  is  clear  that  steepest  change  location  1  produces  the  best  contours,  in 
comparison  to  the  gold  standards,  however  the  ANOVA  test  results  allow  us  to  make  such  a 
claim.  The  following  discussion  is  divided  into  five  sections:  single  observer  malignant 
results,  single  observer  benign  results,  and  two  observer  results  (malignant  and  benign), 
algorithm  performance,  and  an  additional  discussion  on  methods. 

B.  Malignant  Cases  with  Single  Observer 

For  both  the  Expert  A  and  Expert  B  gold  standards.  Tables  2-5  show  a  statistically 
significant  difference  between  groups  1  and  2  on  the  basis  of  overlap  and  sensitivity,  where  the 
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mean  values  of  group  2  were  higher  than  the  mean  values  of  group  1  for  these  statistics.  These 
results  are  expected  because  as  shown  in  the  figures,  the  group  2  contours  consistently  covered 
more  of  the  mass  area  (and  correctly  covered  this  mass  area)  as  compared  to  the  group  1 
contours,  according  to  both  experts.  There  was  a  statistically  significant  difference  in 
sensitivity  between  group  1  and  group  3,  where  the  mean  of  group  3  was  higher  than  the  mean  of 
group  1.  This  is  an  expected  result  because  out  of  all  the  groups,  group  3  contours  consistently 
cover  the  most  mass  area.  For  the  Expert  B  gold  standard  there  was  a  statistically  significant 
difference  in  overlap  between  group  1  and  group  3,  where  the  mean  of  group  3  was  higher  than 
the  mean  of  group  1.  This  is  an  expected  result  because  out  of  all  the  groups,  group  3  contours 
correctly  cover  the  most  mass  area. 

C.  Benign  Cases  with  Single  Observer 

For  the  Expert  A  there  were  statistically  significant  differences  between  the  group  2  and 
group  3  traces  on  the  basis  of  overlap,  accuracy,  and  sensitivity,  where  the  group  2  mean  values 
for  overlap  and  accuracy  were  higher  than  those  of  group  3  (see  Tables  6-9).  This  is  an 
expected  result  because  it  is  likely  that  many  of  the  group  3  contours  contained  flooded  areas, 
which  will  cause  both  of  these  values  to  be  lower  than  contours  without  flooded  areas.  The 
overlap  and  sensitivity  values  for  group  2  were  significantly  higher  than  those  of  group  1.  This 
is  an  expected  result  because  the  group  2  contours  not  only  covered  more  mass  area  and  correctly 
cover  this  area.  Finally,  the  group  3  accuracy  and  sensitivity  values  were  significantly  higher 
than  those  for  group  1.  This  is  an  expected  result  because  the  group  3  contours  not  only  cover 
more  mass  area  but  also  correctly  cover  this  area. 

For  the  Expert  B  gold  standard  there  were  statistically  significant  differences  between  the 
group  2  and  group  3  traces  on  the  basis  of  accuracy  and  sensitivity,  where  the  group  2  mean 
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values  for  overlap  and  accuracy  were  higher  than  those  of  group  3.  This  is  an  expected  result 
because  it  is  likely  that  many  of  the  group  3  contours  contained  flooded  areas,  which  will  cause 
both  of  these  values  to  be  lower  than  contours  without  flooded  areas.  There  were  statistically 
significant  differences  between  group  1  and  group  2  on  the  basis  of  overlap  and  sensitivity, 
where  the  mean  values  for  group  2  were  higher  than  the  mean  values  for  group  1 .  This  is  an 
expected  result  because  the  group  2  contours  not  only  cover  more  mass  area  and  correctly  cover 
this  area.  There  were  statistically  significant  differences  between  group  3  and  group  1  on  the 
basis  of  overlap  and  sensitivity,  where  the  mean  values  for  group  3  were  higher  than  those  of 
group  1.  This  is  an  expected  result  because  the  group  3  contours  not  only  covered  more  mass 
area  and  correctly  covered  this  cirea. 

In  nearly  all  cases  for  the  single  observer  studies,  it  was  expected  that  the  specificity  values 
for  group  1  would  always  be  higher  for  group  1  than  those  for  groups  2  and  3  because  this 
contour  always  covered  the  smallest  mass  area,  consequently  its  background  was  always  highly 
correlated  with  the  background  areas  dictated  by  the  gold  standards.  Moreover,  in  some  cases 
the  group  2  and  group  3  contours  grew  into  areas  that  were  not  regarded  as  mass,  but  rather  were 
regarded  as  background,  therefore  their  specificity  values  had  a  lower  correlation  with  the  gold 
standard  as  compared  to  the  group  1  contours. 

D.  Malignant  and  Benign  Cases  with  Two  Observers 

For  the  two  observer  studies,  comparisons  were  made  between  experts  A  and  B  on  a 
group-by-group  basis  in  an  effort  to  prove  that  there  were  significant  differences  between  the  two 
radiologists  on  the  basis  of  overlap,  accuracy,  sensitivity,  and  specificity  (see  Tables  10-13). 

For  the  malignant  masses  there  were  statistically  significant  differences  between  the  two  experts 
on  the  basis  of  overlap,  accuracy,  and  sensitivity.  There  was  a  statistically  significant 
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difference  between  the  two  experts  for  group  3  on  the  basis  of  sensitivity.  For  the  benign 
masses,  there  were  statistically  significant  differences  between  the  two  experts  for  all  three 
groups  on  the  basis  of  sensitivity.  For  all  cases  Expert  A’s  values  were  consistently  higher  than 
those  of  Expert  B.  It  is  an  expected  result  that  there  were  statistically  significant  differences 
between  the  experts  due  to  their  differences  in  opinion.  The  fact  that  Expert  A’s  mean  values 
were  higher  than  those  for  expert  B,  however  does  not  warrant  the  conclusion  that  Expert  A  is  a 
more  reliable  expert;  however  it  does  not  warrant  the  conclusion  that  there  is  stronger 
agreement  between  the  computer’s  results  and  Expert  A’s  traces.  Further,  there  were  less 
statistically  significant  differences  for  the  benign  cases  than  for  the  malignant  cases.  This  is  an 
expected  result  because  in  general,  benign  masses  have  better  defined  borders  so  it  was  expected 
that  the  two  experts  would  strongly  agree. 

E.  Algorithm  performance 

It  appears  that  the  thresholds  chosen  produce  first  steepest  change  location  intensities  that 
generate  contours  that  are  closely  correlated  with  the  expert  traces.  In  some  instances  the 
second  steepest  change  location  is  extremely  far  from  the  first  steepest  change  location,  which 
implies  that  the  function  in  question  increases  very  slowly;  and,  many  of  the  second  steepest 
change  location  intensities  produce  contours  with  flooded  areas.  For  the  majority  of  the  cases 
in  which  the  second  steepest  change  location  contour  achieves  a  higher  sensitivity  value,  but  not 
a  significantly  higher  sensitivity  value,  we  can  still  choose  the  first  steepest  change  location 
contour  because  the  difference  between  the  two  contours  is  likely  to  be  negligible. 

In  analyzing  the  probability-based  cost  functions,  we  found  that  those  functions  with  very 
steep  changes  are  typically  associated  with  masses  that  have  well-defined  borders  while  those 
functions  that  increase  slowly  are  associated  with  masses  that  have  ill-defined  borders.  This 
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phenomenon  may  make  it  necessary  to  develop  an  adaptive  threshold  process  for  the  steepest 
change  evaluation  such  that  the  functions  are  grouped  into  various  categories  (e.g.  -  smooth 
versus  steep)  because  a  threshold  value  that  is  optimal  for  the  steep  function  may  not  be  optimal 
for  a  smooth  function. 

F.  Additional  discussion  on  methods  used 

In  this  study  it  appears  that  the  steepest  descent  method  has  the  advantage  of  locating 
ill-defined  margins  as  well  as  extensions  such  as  malignant  spiculations  and  projections  for 
mammographic  masses.  If  the  human  eye  is  solely  used,  it  can  be  difficult  to  separate  the  mass 
from  surrounding  fibroglandular  tissue.  Therefore,  it  is  believed  that  this  method  has  the 
potential  to  complement  the  process  of  reading  mammographic  films.  One  of  the  downfalls  of 
the  method  is  its  dependence  upon  the  assumption  that  masses  are  generally  light  in  color.  This 
assumption  impedes  the  region  growing  process  because  masses  that  contain  darker  areas  and  are 
surrounded  on  one  or  more  sides  by  bright  tissue  can  cause  contours  to  flood  into  areas  that  are 
not  actual  mass  tissue.  Typically,  this  situation  occurs  for  the  mass  located  on  the  border  of  the 
breast  region  on  a  mammogram. 

All  of  the  segmentation  methods  surveyed  in  the  introduction  of  this  paper  are  excellent 
solutions  for  the  problems  the  authors  set  out  to  solve,  however,  in  some  cases  it  is  difficult  to 
make  comparisons  between  different  methods  without  the  availability  of  a  set  of  several  visual 
results.  In  several  studies,  the  focus  was  either  to  detect  masses  or  to  distinguish  malignant 
from  benign  masses.  So  the  validation  process  did  not  take  the  form  of  a  comparison  with 
expert  radiologist  manual  traces,  but  rather  features  were  calculated  on  the  potential  mass 
candidates  and  they  were  later  classified  as  being  mass  tissue  or  normal  tissue'*'’' The 
purpose  of  Li’s  study'"'  was  to  distinguish  normal  and  abnormal  tissue  so  the  authors  did  not 
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provide  any  statistics  such  as  overlap  or  accuracy.  Nevertheless,  the  study  contains  a  figure  of 
60  masses  that  contain  both  computer  and  radiologist  annotations  to  give  the  reader  an  idea  of 
the  computer  algorithm’s  performance.  Te  Brake  and  Karssemeijer’s  study^  used  the  overlap 
statistic  to  test  the  efficacy  of  their  method  and  they  indicated  that  the  central  mass  area  of  the 
mass  was  delineated  by  the  radiologist  and  their  computer  results  were  compared  to  these 
annotations.  Kupinski  and  Giger’s  study*^  also  used  the  overlap  statistic  to  test  the  efficacy  of 
their  method  and  set  a  threshold  for  which  the  mass  was  considered  to  be  successfully  segmented. 
For  example,  masses  whose  overlap  values  are  greater  than  0.7  imply  that  there  was  successful 
segmentation. 

The  technical  method  presented  herein  shows  that  the  results  obtained  from  the 
maximization  of  the  composed  probability  density  function  (i.e.,  the  cost  function)  are  equivalent 
to  those  obtained  from  previous  methods  presented  by  previous  investigator.  However,  the 
steepest  change  of  the  composed  probability  density  function  is  most  close  to  the  radiologist 
determination. 

V.  CONCLUSION 

We  have  shown  that  our  fully  automatic  boundary  detection  method  for  malignant  and 
benign  masses  can  effectively  delineate  these  masses  using  intensities,  which  correspond  to  the 
first  steepest  change  location  within  their  cost  functions  .  Additionally,  it  appears  that  the 
method  is  more  highly  correlated  with  one  set  of  expert  traces  than  with  a  second  set  of  expert 
traces,  regarding  the  accuracy  and  overlap  statistics.  This  result  shows  that  inter-observer 
variability  can  be  an  important  factor  in  segmentation  algorithm  design,  and  it  has  motivated  us 
to  seek  the  opinions  of  more  expert  radiologists  to  test  the  robustness  of  our  algorithm.  The 
second  steepest  change  location  intensity  will  always  yield  contours  with  higher  sensitivity 
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values,  however,  it  behooves  us  to  choose  the  first  steepest  change  location  intensity  because  it 
avoids  the  risk  of  choosing  contours  that  contain  substantial  flooding.  In  future  work,  a 
worthwhile  study  would  be  to  run  the  experiments  for  different  threshold  values  in  an  effort  to 
discover  the  possibility  of  deriving  an  optimal  threshold  procedure.  We  believe  that  such  a 
procedure  would  improve  the  method  of  choosing  optimal  contours. 
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APPENDIX  A  -  Gallery  of  Segmentation  Results 
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Fig.  AI  -  Segmentation  results  for  a  set  of  malignant  masses 
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Fig.  A2  -  Segmentation  results  for  a  set  of  benign  masses 


36 


